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Abstract 

(N 

We analyze the entire publication database of the American Physical Society generating lon- 
gitudinal (50 years) citation networks geolocalized at the level of single urban areas. We define 
the knowledge diffusion proxy, and scientific production ranking algorithms to capture the spatio- 
temporal dynamics of Physics knowledge worldwide. By using the knowledge diffusion proxy we 

q identify the key cities in the production and consumption of knowledge in Physics as a function of 

time. The results from the scientific production ranking algorithm allow us to characterize the top 

c/2 cities for scholarly research in Physics. Although we focus on a single dataset concerning a specific 

field, the methodology presented here opens the path to comparative studies of the dynamics of 
knowledge across disciplines and research areas. 

Over the last decade, the digitalization of publication datasets has propelled bibliographic studies allowing 
for the first time access to the geospatial distribution of millions of publications, and citations at differ- 
ent granularities [T]l2l[3lS|5l|6l|7l[H ( see El f° r a review). More precisely, authors' name, affiliations, 
J> addresses, and references can be aggregated at different scales, and used to characterize publications and ci- 

tations patterns of single papers flU Qj], journals HHEi authors EKTUUSI, institutions 021, cities lfl8l . 
^ or countries Ifl9l . The sheer size of the datasets allows also system level analysis on research production 

v^Q and consumption 11201 . migration of authors ETl l22l . and change in production in several regions of the 

world as a function of time 0, just to name a few examples. At the same time those analyses have 
spurred an intense research activity aimed at defining metrics able to capture the importance/ranking of 
authors, institutions, or even entire countries (23l [24l [J4l US [JTl [25l [26l 1171 HH |29]|. Whereas such large 
datasets are extremely useful in understanding scholarly networks and in charting the creation of knowl- 
. _ edge, they are also pointing out the limits of our conceptual and modeling frameworks ll30l and call for a 

deeper understanding of the dynamics ruling the diffusion and fruition of knowledge across the the social 
^ and geographical space. 

In this paper we study citation patterns of articles published in the American Physical Society (APS) jour- 
nals in a fifty-year time interval (1960-2009) [31]. Although in the early years of this period the dataset was 
obviously biased toward the scholarly activity within the USA, in the last twenty years only about 35% of 
the papers are produced in the USA. The same amount of production has been observed in databases that 
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include multiple journals, and disciplines |[T9l 171 . Indeed the journals of the APS are considered worldwide 
as reference publication venues that well represent the international research activity in Physics. Further- 
more this dataset does not bundle different disciplines and publication languages, providing a homogeneous 
dataset concerning Physics scholarly research. For each paper we geolocalize the institutions contained in 
the authors' affiliations. In this way we are able to associate each paper in the database with specific urban 
areas. This defines a time resolved, geolocalized citation network including 2,307 cities around the world 
engaged in the production of scholarly work in the area of Physics. Following previous works ifTTl [8) we 
assume that the number of given or received citations is a proxy of knowledge consumption or production, 
respectively. More precisely, we assume that citations are the currency traded between parties in the knowl- 
edge exchange. Nodes that receive citations export their knowledge to others. Nodes that cite other works, 
import knowledge from others. According to this assumption we classify nodes considering the unbalance 
in their trade. Knowledge producers are nodes that are cited (export) more than they cite, (import). On 
the contrary, we label as consumers nodes that cite (import) more than they are cited (export). Using this 
classification, we define the knowledge diffusion proxy algorithm to explore how scientific knowledge flows 
from producers to consumers. This tool explicitly assumes a systemic perspective of knowledge diffusion, 
highlighting the global structure of scientific production and consumption in Physics. 

The temporal analysis reveals interesting patterns and the progressive derealization of knowledge produc- 
ers. In particular, we find that in the last twenty years the geographical distribution of knowledge production 
has drastically changed. A paramount example is the transition in the USA from a knowledge produc- 
tion localized around major urban areas in the east and west coast to a broad geographical distribution 
where a significant part of the knowledge production is now occurring also in the midwestern and southern 
states in USA. Analogously, we observe the early 90s dominance of UK and Northern Europe to subside 
to an increase of production from France, Italy and several regions of Spain. Interestingly, the last decade 
shows that several of China's urban areas are emerging as the largest knowledge consumers worldwide. 
The reasons underlying this phenomenon may be related to the significant growth of the economy and the 
research/development compartment in China in the early 21 th century [32]. This positive stimulus, pushed 
up also the scientific consumption with a large number of paper citing work from other world areas. Indeed, 
the increase of publications is associated to an increase of the citations unbalance, moving China to the top 
rank as consumers since the recent influx of its new papers has not yet had the time to accumulate citations. 

Although the knowledge diffusion proxy provides a measure of knowledge production and consumption, 
it may be inadequate in providing a rank of the most authoritative cities for Physics research. Indeed, a 
key issue in appropriately ranking the knowledge production, is that not all citations have the same weight. 
Citations coming from authoritative nodes are heavier than others coming from less important nodes, thus 
defining a recursive diffusion of ranking of nodes in the citation network. In order to include this element 
in the ranking of cities we propose the scientific production ranking algorithm. This tool, inspired by the 
PageRank [33], allows us to define the rank of each node, as function of time, going beyond the knowledge 
diffusion proxy or simple local measures as citation counts or h- index llT4l . In this algorithm the importance 
of each node diffuses through the citation links. The rank of a node is determined by the rank of the nodes 
that cite it, recursively, thus implicitly weighting differently citations from highly (lowly) ranked nodes. 
Also in this case we observe noticeable changes in the ranking of cities along the years. For instance the 
presence of both European and Asian cities in the top 100 list increases by 50% in the last 20 years. This 
findings suggest that the Internet, digitalization and accessibility of publications are creating a more levelled 
playing field where the dominance of specific area of the world is being progressively eroded to the advan- 
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tage of a more widespread and complex knowledge production and consumption dynamic. 



Results 

We focus our analysis on the APS dataset OTI . It contains all the papers published by the APS from 1893 to 
2009. We consider only the last 50 years due to the incomplete geolocalization information available for the 
early years. During this period, the large majority of indexed papers, 97.47%, contain complete information 
such as authors name, journal of publication, day of publication, list of affiliations and list of citations to 
other articles published in APS journals. We geolocalized 96.97% of papers at urban area level with an 
accuracy of 98.5%. We refer the reader to the Methods section and to the Supplementary Information (SI) 
for the detailed description of the dataset and the techniques developed to geolocalize the affiliations. 

In total, only 43% of papers has been produced inside the USA. Interestingly, over time this fraction has de- 
creased. For example, in the 60's it was 85.59%, while in the last 10 years decreased to just 36.67%. While 
one might assume that the APS dataset is biased toward the USA scientific community, the percentage of 
publications contributed by the USA in APS journals after 1990 is almost the same as in other publication 
datasets |fl"9l l711. These alternative datasets contain journals published all over the world and mix different 
scientific disciplines. This supports the idea that the APS journals are now attracting the worldwide physics 
scientific community independently of nationality, and fairly represent the world production and consump- 
tion of Physics. It is not possible to provide quantitative analysis of possible nationality bias and disentangle 
it by an actual change of the dynamic of knowledge production. For this reason, and in order to minimize 
any bias in the analysis we focus our analysis in the last 20 years of data. 

In order to construct the geolocalized citation network we consider nodes (urban areas) and directed links 
representing the presence of citations from a paper with affiliation in one urban area to a paper with affil- 
iation in another urban area. For example, if a paper written in node i cites one paper written in node j 
there is an link from i to j, i.e., j receives a citation from i and % sends a citation to j. Each paper may 
have multiple affiliations and therefore citations have to be proportionally distributed between all the nodes 
of the papers. For this reason we weight each link in order to take into account the presence of multiple 
affiliations and multiple citations. In a given time window, the total number of citations for papers written 
in j received from papers written in i, is the weight of the link i — > j, and the total number of citations for 
those paper written in j sent to the papers written in k is the weight of the link j — >• k. For instance, if in a 
time window t, there is one paper written in node j, which cite two papers written in node k and was cited 
by three papers written in node i, then Wjk = 2, Wij = 3, and we add all such weights for each paper written 
in that node j and obtain the weights for links. For papers written in multiple cities, say j\,j2, the weight 
will be counted equally. The time window we use in this manuscript is one year. We show an example of 
the network construction in Figure ([TJ. 

In order to define main actors in the production and consumption of Physics, we consider citations as a cur- 
rency of trade. This analogy allows us to immediately grasp the meaning and distinction between producers 
and consumers of scientific knowledge. Nodes that receive citations export their knowledge to the citing 
nodes. Instead, nodes that cite, papers produced from other nodes of the network, import knowledge from 
the cited nodes. Measuring the unbalance trade between citations, we define producers as cities that export 
more than they import, and consumers as cities that import more than they export. More precisely, we can 
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Paper A 

xxx university, Ann Arbor, Michigan 
yyy university, Los Alamos, New Mexico 
zzz university. New York, New York 




Paper B 



university of aaa, Madrid, Spain 
bbb center, Rome, Italy 



Paper C 



university of ccc, Oxford, United Kingdom 
ddd institute, Princeton, New Jersey 




Figure 1 : Projecting a paper citation relationship into a city-to-city citation network. (A) Paper A 
written by authors from Ann Arbor , Los Alamos and New York cites one paper B written by authors from 
Rome and Madrid and another paper C from Oxford and Princeton. (B) In a city-to-city citation network, 
directed links from Ann Arbor to Madrid, Rome, Oxford and Princeton are generated, and similarly Los 
Alamos and New York are connected to the above four cited cities. 



measure the total knowledge imported by each urban area as Y2j wij and the total export as Y2j w ji i n a 
given year. Those measures however acquire specific meaning when considered relatively to the total trade 
of physics knowledge worldwide in the same year; i.e. the total number of citations worldwide S = w ij- 
The relative trade unbalance of each urban area i is then: 

A Si = ^ W »-^ W » . (1) 

A negative or positive value of this quantity indicates if the urban area i is consumer or producer, respec- 
tively. In Figure Q-A we show the worldwide geographical distribution of producer (red) and consumer 
(blue) urban areas for the 1990 and 2009. Interestingly, during the 90s the production of Physics knowledge 
was highly localized in a few cities in the eastern and western coasts of the USA and in a few areas of Great 
Britain and Northern Europe. In 2009 the picture is completely different with many producer cities in central 
and southern parts of the USA, Europe and Japan. It is interesting to note that despite the fraction of papers 
produced in the USA is generally decreasing or stable, many more cities in the USA acquire the status of 
knowledge producers. This implies that the quality of knowledge production from the USA is increasing 
and thus attracting more citations. This makes it clear that the knowledge produced by an urban area can 
not be considered to be measured only by the raw number of papers. Citations are a more appropriate proxy 
that encodes the value of the products. They serve as an approximation of the actual flow of knowledge. 
The Figure Q-A also makes it clear that cities in China are playing the role of major consumers in both 
1990 and 2009. We also observe that cities in other countries like Russia and India consumed less in 2009 
than 1990. In other words, in 2009 both the production and consumption of knowledge are less concentrated 
on specific places and generally spread more evenly geographically. In order to provide visual support to 
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this conclusion we show in Figure Q-B the geographical distribution of producers and consumers inside 
the USA. From the two maps it is evident the drift of knowledge production from the two coastal areas in 
the USA to the midwest, central and southern states. Similarly, in Figure Q-C we plot the same informa- 
tion for western Europe. In 1990 only a few urban areas in Germany and France were clearly producers. 
By 2009 this dominance has been consistently eroded by Italy, Spain and a more widespread geographical 
distribution of producers in France, Germany and UK. 

Knowledge diffusion proxy. 

The definition of producers and consumers is based on a local measure, that does not allow to capture all 
possible correlations and bounds between nodes that are not directly connected. This might result in a partial 
view and description of the system, especially when connectivity patterns are complex |[36ll37ll38l[39ll40l . 
Interestingly, a close analysis of each citation network, see Figure Q, clearly shows that citation patterns 
have indeed all the hallmarks of complex systems Il36ll37ll38l[39ll40l . especially in the last two decades. 
The system is self-organized, there is not a central authority that assigns citations and papers to cities, there 
is not a blueprint of system's interactions, and as clearly shown from Figure Q-C the statistical character- 
istics of the system are described by heavy-tailed distributions |[36l l37l 1381 1391 1401 . Not surprisingly, the 
level of complexity of the system has increased with time. In Figure Q-A we plot the most statistically 
significant connections of the citation network between cities inside USA in 1960, 1990 and 2009. We filter 
links by using the backbone extraction algorithm [41 J which preserves the relevant connections of weighted 
networks while removing the least statistically significant ones. We visualize each filtered network by using 
a bundled representation of links ll42l . The direction of each weighted link goes from blue (citing) to red 
(cited). Similarly, in Figure Q-B, we visualize the most significant links between cities in Europe (Euro- 
pean Union's 27 countries, as well as Switzerland and Norway). It is clear from Figure Q-A that in 1960 
the citation patterns inside the USA were limited to a few cities, and in Europe only a few cities were con- 
nected. Instead, in 1990 and 2009 we register an increase in the interactions among a larger number of cities. 
The observed temporal trend is well known and valid not just for Physics B3l . Among many factors that 
have been advocated to explain this tendency we find the increase of the research system and the advance in 
technology that make collaboration and publishing easier Il44ll45ll46*ll20l . 

In order to explicitly consider the complex flow of citations between producers and consumers, we propose 
the knowledge diffusion proxy algorithm (see Methods section for the formal definition). In this algorithm, 
producers inject citations in the system that flow along the edges of the network to finally reach consumer 
cities where the injected citations are finally absorbed. The algorithm allows charting the diffusion of knowl- 
edge, going beyond local measures. The entire topology of the networks is explored uncovering nontrivial 
correlations induced by global citation patterns. For instance, knowledge produced in a city may be con- 
sumed by another producer that in turn produces knowledge for other cities who are consumers. This points 
out that the actual consumer of knowledge is not just signalled by the unbalance of citations but in the overall 
topology of the production and consumption of knowledge in the whole network. Indeed, the final consumer 
of each injected citation may not be directly connected with the producer. Citations flow along all possible 
paths, sometimes through intermediate cities. In Table ([TJ, and Table Q we report the rankings of Top 
10 final consumers evaluated by the knowledge diffusion proxy for the Top 3 producers in 2009 and 1990 
respectively. We also list the Top 10 neighbours according to the local citation unbalance. From these two 
tables, it is clear that the final rank of each consumer, obtained by our algorithm, can be extremely different 
from the ranking obtained by just considering local unbalances. For instance, in 2009 Bratislava and Mainz 
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rank in top 10 consumers absorbing knowledge produced in Boston. However, according to local measure 
of unbalance, these two cities are ranked out of top 10 (shown in bold in Table ([T}). Interestingly, even the 
Top consumer for New Haven, Berlin, also does not rank among the Top 10 neighbours according to the 
citation unbalance. These findings confirm that in order to uncover the complex set of relationships among 
cities, it is crucial to consider the entire structure of the network, going beyond simple local measures. 

Table 1: Rankings from Knowledge diffusion proxy algorithm for top 3 producer cities in 2009. In bold, we 
highlight cities that are present in top 10 consumers ranked according to the knowledge diffusion proxy but 
do not appear in top 10 cities ranked according to local citation unbalance. 



Boston Berkeley New Haven 

Diffusion proxy Citation unbalance Diffusion proxy Citation unbalance Diffusion proxy Citation unbalance 



Athens 


Madrid 


Athens 


Athens 


Berlin 


Vancouver 


Madrid 


Athens 


Gwangju 


Madrid 


Athens 


Paris 


Vancouver 


Vancouver 


Bratislava 


Bratislava 


Mainz 


Trieste 


Gwangju 


Moscow 


Madrid 


Paris 


Vancouver 


Athens 


Bratislava 


Paris 


Vancouver 


Vancouver 


Gwangju 


Gwangju 


Berlin 


Tokyo 


Trieste 


Gwangju 


Trieste 


Bratislava 


Trieste 


Trieste 


Waco 


Moscow 


Bratislava 


Madrid 


Mainz 


Beijing 


Paris 


Trieste 


Coventry 


Liverpool 


Paris 


Berlin 


Berlin 


Seoul 


Valencia 


Oxford 


Waco 


Gwangju 


Mainz 


Waco 


Madrid 


Santa Barbara 



Table 2: Rankings from Knowledge diffusion proxy algorithm for top 3 producer cities in 1990. In bold, we 
highlight cities that are present in top 10 consumers ranked according to the knowledge diffusion proxy but 
do not appear in top 10 cities ranked according to local citation unbalance. 



Piscataway Boston Palo Alto 

Diffusion proxy Citation unbalance Diffusion proxy Citation unbalance Diffusion proxy Citation unbalance 



Tokyo 


Stuttgart 


Tokyo 


Tokyo 


Tokyo 


Tokyo 


Beijing 


Tokyo 


Grenoble 


Grenoble 


Beijing 


Ann Arbor 


Tsukuba 


Los Angeles 


Beijing 


Los Angeles 


Tsukuba 


Bloomington 


Grenoble 


Urbana 


Tsukuba 


College Park 


Seoul 


Boulder 


Tallahassee 


College Park 


Seoul 


Los Alamos 


Tallahassee 


Urbana 


Hamilton 


Grenoble 


Vancouver 


Urbana 


Charlottesville 


Berlin 


Buffalo 


Rochester 


Tallahassee 


Boulder 


Vancouver 


Orsay 


Vancouver 


Boston 


Warsaw 


Rochester 


Berlin 


Denver 


Charlottesville 


Los Alamos 


Kolkata 


Vancouver 


Durham 


Seoul 


Tempe 


Hamilton 


Charlottesville 


Bloomington 


Taipei 


Los Alamos 



In Figure Q-A and Figure Q-B we visualize the results considering the Top four producer cities in 2009 
in the USA and in Europe respectively. We show their Top ten consumers over 20 years as function of time. 
The size of each circle is proportional to how many times each injected citation is absorbed by that consumer. 
In the plot, vertical grey strips indicate that the city was not a producer during those years (e.g. Orsay in 
2008). The results show that, on average, Beijing is the top consumer for all of these producers in the past 
20 years. Since China registered a big economical growth and increment of research population in the early 
2000, it is reasonable to assume that, thanks to this positive stimulus, many more papers were written in 
its capital, a dominant city for scientific research in China. However, the fast publication growth increased 
the unbalance between sent and received citations. Each paper published in a given city imports knowledge 
from the cited cities. Reaching a balance might require some time. Each city needs to accumulate citations 
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back to export its knowledge to others cities. We can speculate that in the near future cities in China might 
be moving among the strongest producers if a fair number of papers start receiving enough citations, which 
obviously depends on the quality of the research carried out in the last years. This is the case of cities like 
Tokyo which has gradually approached the citation balance in recent years. For instance, Table ([2]) shows 
that in 1990 Tokyo, was among the top consumers. But by 2009, its contribution to citation consumption 
had become less significant as observed from Figure Q and Table ([T]). 

Ranking Cities. 

Authors, departments, institutions, government and many funding agencies are extremely interested in defin- 
ing the most important sources of knowledge. The necessity to find objective measures of the importance 
of papers, authors, journals, and disciplines leads to the definition of a wide variety of rankings E3l 1241 . 
Measures such as impact factor, number of citations and h-index lTT4ll are commonly used to assess the im- 
portance of scientific production. However, these common indicators might fail to account for the actual 
importance and prestige associated to each publication. In order to overcome these limitations, many dif- 
ferent measures have been proposed I25ll26ll27ll28l . Here we introduce the scientific production ranking 
algorithm (SPR), an iterative algorithm based on the notion of diffusing scientific credits. It is analogous to 
PageRank ll33l . CiteRank |[26l . HITS ll25l . SARA l29l . and others ranking metrics. In the algorithm each 
node receives a credit that is redistributed to its neighbours at the next iteration until the process converges 
in a stationary distribution of credit to all nodes (see Methods section for the formal definition). The credits 
diffuse following citations links self-consistently, implying that not all links have the same importance. Any 
city in the network will be more prominent in rank if it receives citations from high-rank sources. This 
process ensures that the rank of each city is self-consistently determined not just by the raw number of cita- 
tions but also if the citations come from highly ranked cities. In Figure §5§ we show the Top 20 cities from 
1990 to 2009. Interestingly, we clearly see the decline and rise of cities along the years as well as the steady 
leadership of Boston and Berkeley. This behaviour is clear in Figure Q-B where we show the rank for cities 
in USA in 1990 and 2009. Meanwhile, the ranking of cities in European and Asian countries like France, 
Italy and Japan has increased significantly, as shown in both Figure ([5]) and Figure Q-A. In Figure Q-C we 
focus on the geographical distribution of ranks for a selected set of European countries in 1990 and 2009. In 
Table Q we provide a quantitative measure of the change in the landscape of the most highly ranked cities 
in the world by showing the percentage of cities in the top 100 ranks for different continents. In Figure (JTJ), 
we compare the ranking obtained by our recursive algorithm with the ranking obtained by considering the 
total volume of publications produced in each city. Since we are considering only journals by the APS, the 
impact factor is consistent across all cities and does not include disproportionate effects that often happen 
when mixing disciplines or journal with varied readership. It is then natural to consider a ranking based on 
the raw productivity of each place. As we see in the figure though the two rankings, although obviously 
correlated, provide different results. A number of cities whose ranking, according to productivity, is in the 
Top 20 cities in the world, are ranked one order of magnitude lower by the SPR algorithm. Valuing the 
number of citations and their origin in the ranking of cities produces results often not consistent with the 
raw number of papers, signaling that in some places a large fraction of papers are not producing knowledge 
as they are not cited. We believe that the present algorithm may be considered as an appropriate way to rank 
scientific production taking properly into account the impact of papers as measured by citations. 
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Table 3: Percentage of top 100 ranked cities in continents in 1990 and 2009. 



Continent 



1990 



2009 



Asia 
Europe 
N. America 



4.0% 
24.0% 
72.0% 



11.0% 
33.0% 
56.0% 



Discussion 

In this paper we study the scientific knowledge flows among cities as measured by papers and citations 
contained in APS [31] journals. In order to make clear the meaning and difference between producers 
and consumers in the context of knowledge, we propose an economical analogy referring to citations as 
a traded currency between urban areas. We then study the flow of citations from producers to consumers 
with the knowledge production proxy algorithm. Finally, we rank the importance of cities as function of 
time using the scientific production ranking algorithm. This method, inspired by the PageRank ll33l . allows 
us to evaluate the importance of cities explicitly considering the complex nature of citation patterns. In 
our analysis we considered just scientific publications contained in the APS journals PTTl . We do not have 
information on citations received or assigned to papers outside this dataset. These limitations certainly affect 
the count of citations of each city, potentially creating biases in our results. However, our findings, while 
limited to a particular dataset, are aligned with different observations reported by other studies focused on 
other datasets and fields. For example, we identify major US cities (e.g. Boston and San Francisco areas), 
as the most important sources of Physics. Similar observations have been done by Borner et al. ifTTl at 
the institution level considering papers published in the Proceedings of the National Academy of Sciences, 
by Mazloumian et al. [3 at country and city level with Web of Science dataset, and by Batty H at both 
institution and country level considering the Institute for Scientific Information (ISI) HighlyCited database. 
We also find that some European, Russian and Japanese cities have gradually improved their productivities 
and ranks in recent twenty years. Similarly, such growth in scientific production has been observed by 
King |[T9l in the ISI database. As discussed in detail in the SI, by aggregating citations of cities to their 
respective countries, we find the same correlation between the number of citations, as well as the number of 
papers, and the GDP invested on Research and Development of several countries as reported by Pan et al. Q 
based on the ISI database. This analogy between our results, and many others in the literature, suggests that 
the APS dataset, although limited, is representative of the overall scientific production of the largest countries 
and cities in the recent 20 years. The methodology proposed in this paper could be readily extended to 
larger datasets for which the geolocalization of multiple affiliation is possible. In view of the different rate 
of publications and citations in different scientific fields we believe however that the analysis of scientific 
knowledge production should only consider homogeneous datasets. This would help the understanding of 
knowledge flows in different areas and identify the hot spot of each discipline worldwide. 



The dataset of the American Physical Society journals, considering papers published between 1893 and 
2009 of which 450, 655 papers include a list of affiliations OTI . Each of paper may have multiple affilia- 
tions. In total there are 945, 767 affiliation strings. 



Methods 



Dataset. 
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In order to geolocalize the articles, we parse the city names from the affiliation strings for each article. First, 
we process each affiliation string and try to match country or US state names from a list of known names and 
their variations in different languages. We crosscheck the results with Google Map API obtaining validated 
location information for 97.7% of affiliation strings, corresponding to 445, 223 articles. It is worth notic- 
ing that we do not use Google Map API (or other map APIs like Yahoo! or Bing) directly for geocoding 
because, to our best knowledge, there are no accuracy guarantees to these API results. For each affiliation 
string with an extracted country or state name, we also match the city name against GeoName database H71 
corresponding to its country or US state. 92.6% of affiliation strings with extracted city names are subse- 
quently verified with Google Map API. Finally, a total of 425, 233 publication articles successfully pass the 
filters we describe here. 

The dataset also provides 4, 710, 548 records of citations between articles published in APS journals. To 
build citation networks at the city level, we merge the citation links from the same source node to the same 
target node, and put the total citations on this link as the weight. For articles with multiple city names, the 
weight will be equally distributed to the links of these nodes. There are totally 2, 765, 565 links for city-to- 
city citation networks from 1960 to 2009. (For the full details of parsing country and city names, as well as 
building networks, see Supplementary Information (SI)) 

Knowledge diffusion proxy algorithm. 

This analysis tool is inspired by the dollar experiment, originally developed to characterized the flow of 
money in economic networks [48]. Formally, it is a biased random walk with sources and sinks where 
a citation diffuses in the network. The diffusion takes place on top of the network of net trade flows. 
Let us define Wij as the number of citation that node i gives to j and Wji as the opposite flow. We can 
define the antisymmetric matrix Ty = wij — Wji. The network of the net trade is defined by the matrix F 
with Fij = \Tij\ = \Tji\ for all connected pairs with Tij < and Fy = for all connected pairs 
with > 0. There are two types of nodes. Producers are nodes with a positive trade unbalance 
Asi = s- n — s° ut = Ylj Fji ~ Ylj Fij- Their strength-in is larger than their strength-out. On the other 
hand, consumers are nodes with a negative unbalance As. On top of this network a citation is injected in 
a producer city. The citation follows the outgoing edges with a probability proportional to their intensities, 
and the probability that the citation is absorbed in a consumer city j equals to P a bs(j) = Asj/s 1 - 1 . By 
repeating many times this process from each starting point (producers) we can build a matrix with elements 
eij that measure how many times a citation injected in the producer city i is absorbed in a city consumer j. 

Scientific production ranking algorithm. 

The scientific production rank is defined for each node i according to this self-consistent equation: 

P t = q Zl + (1 - q) + (1-«)*E Pj 8 • (2) 

3 3 3 

Pi is the score of the node i, < q < 1 is the damping factor (defining the probability of random jumps 
reaching any other node in the network), Wji is the weight of the directed connection from j to i, s° ut is the 
strength-out of the node j and finally 5(x), is the Dirac delta function that is for x = and 1 for x = 1. 
Here we use the damping factor q = 0.15. The first term on the r.h.s. of Eq. ([2]) defines the redistribution 
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of credits to all nodes in the network due to the random jumps in the diffusion. The second term defines 
the diffusion of credit through the network. Each node i will get a fraction of credit from each citing node 
j proportional to the ratio of the weight of link j — > i and the strength-out of node j. Finally the last term 
defines the redistribution of credits to all the nodes in the networks due to the nodes with zero strength- 
out. In the original PageRank the vector z has all the components equal to l/N (where N is the total 
number of nodes). Each component has the same value because the jumps are homogeneous. In this case 
instead, the vector z considers the normalized scientific credit given to the node i based on his productivity. 
Mathematically we have: 

Zi = Z P 6p,i l _hh (3) 

where p defines the generic paper and n p the number of nodes who have written the paper. It is important to 
notice that 5 Pj i = 1 only if the i-th node wrote the paper p, otherwise it equals zero. 
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Figure 2: Spatial distributions of scientific producers and consumers of Physics. The geospatial distri- 
bution of scientific producer and consumer cities. (A) The world map of producers and consumers at the 
city level in 1990 (top) and 2009 (bottom). A producer city, of which the relative unbalance ASi > 0, is 
coloured in red scale. A consumer with the relative unbalance ASi < is coloured in blue scale. The dark- 
ness of colour is proportional to the absolute value of unbalance. The larger the absolute value of unbalance, 
the darker the colour. (B) The map of producer and consumer cities in the continental United States in 1990 
(left) and 2009 (right). (C) The map of producer and consumer cities in selected European countries in 1990 
(left) and 2009 (right). In (B) and (C), a producer city is marked with a red bar, while a consumer city is 
marked with a blue bar. The height of each bar is scaled with |AS$|. Note that in (C) the height of bars is 
not scaled with the height in (B) for visibility. Maps in panel A are created by using ArcGIS® [34], and 
maps in panel B and C are created by using R ll35l . 
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Figure 3: Networks structure. The network structures of city-to-city citation networks. (A) The backbones 
(a = 0.1) of the citation networks at the city level within the United States in 1960, 1990, 2009 (from the 
left to right). (B) The backbones (a = 1, 0.1, 0.1 from left to right) of the citation networks at the city 
level within the European Union 27 countries as well as Switzerland and Norway in 1960, 1990, 2009 (from 
the left to right). In (A) and (B), the color shows the direction of links: if node i cites node j there is a 
link starting with blue and ending with red. (C) The cumulative distribution function of the link weights 
Fw{wij) = P{w > Wij) for the city-to-city citation networks in year 1960, 1990 and 2009 (from left to 
right). The maps of networks in (A) and (B) were created using JFlowMap ll42l . 
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Figure 4: Knowledge diffusion proxy results. (A) The Top 4 producer cities in the USA in 2009 and their 
Top 10 consumers from knowledge diffusion proxy algorithm in 1990 — 2009. (B) The Top 4 producer cities 
in the European Union 27 countries as well as Switzerland and Norway in 2009 and their Top 10 consumers 
from knowledge diffusion proxy algorithm in 1990 — 2009. When a producer city becomes a consumer in 
some year, a grey strip is marked in that year. For each producer city in (A) and (B), the major consumers 
of the first producer city m in 20 years are plotted as a function of time from 1990 to 2009. The size of the 
bubble in position (1", c) is also proportional to the counter g miC (Y) in that year. The consumer cities for 
each producer are ordered according to the total number of counters in 20 years, i.e., ^y max g miC (Y). 



13 



rank 1990 



13 
14 
15 



18 
19 
20 



Piscataway 
Boston 

Berkeley 

Palo Alto 

Yorktown Heights 
Los Angeles 
New York City 
Los Alamos 
Princeton 
Urbana 
Chicago 
Philadelphia 
Ithaca 
Lemont 
Orsay 




- Boston 1 
Berkeley 
Piscataway 
Los Angeles 
New York City 
Chicago 
Urbana 
Rochester 
Batavia 

West Lafayette 
Lemont 
Orsay 

East Lansing 
Ann Arboi 
Tokyo- 

College Station 
Tsukuba - 
Philadelphia 
Palo Alto 
Madison 




Figure 5: Top 20 ranked cities as a function of time. The plot summarizes Top 20 ranked cities in 1990, 
1995, 2000, 2005 and 2009 (from left to right), and relations between the rankings in different years. The 
grey lines are used when the rank of that city drops out of Top 20. 
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Figure 6: Geospatial distribution of city ranks. (A) The world map of city ranks in 1990 (left) and 2009 
(right). The ranking of each city is represented by color from blue (high ranks) to white (low ranks). (B) The 
map of ranks for cities in the United States in 1990 (left) and 2009 (right). (C) The map of ranks for cities 
in the selected European countries in 1990 (left) and 2009 (right). In (B) and (C), each city is marked with a 
bar, and the height of each bar is inversely proportional to the ranking position. The Top 3 rank positions in 
each region are labelled for reference. Note that in (C) the height of bars is not scaled with the height in (B) 
for visibility. Maps in panel A are created by using ArcGIS® [34], and maps in panel B and C are created 
by using R ll35l . 
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ranking based on the number of publications 

Figure 7: Correlation between scientific production ranking and ranking based on the number of 
publications in 2009. The x-axis represents rankings based on the number of papers each city published 
in 2009, and the y-axis represents the scientific production ranking for each city in 2009. The solid line 
corresponds to the power-law fitting of data with slope —0.98, and separates the space into two regions. 
In the region below the line (coloured blue), cities gain better rankings from scientific production ranking 
algorithm even with relatively less publications, such as Chicago and Piscataway. In the region above 
(coloured green) cities have lower rankings from the algorithm even they have more papers published, such 
as Beijing, Berlin, Wako and Shanghai. 
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Supplementary Information 



1 Extracting Geographic Information 

The database of Physical Review publications used in this paper consists of 463, 348 articles, each of which 
is identified by a unique Digital Object Identifier (DOI). 83% of these articles (450, 655) record the publish- 
ing year, the author(s) of the article, as well as the corresponding affiliation(s). An article may have more 
than one affiliation, and the database provides affiliation strings for each article. In total, we have 945, 767 
affiliation strings, and we aim to extract country and city information from the affiliation strings for each 
article. 



We observe that an affiliation string likely stands for a single affiliation, roughly consisting of several comma 
separated fields: 

(SUB-INSTITUTE) *, (INSTITUTE), (OTHER INFORMATION) *, (CITY), (OTHER INFORMATION)*, 
(COUNTRY/STATE) 

where 'SUB-INSTITUTE' means department, college, institute, laboratory within an institute, the aster- 
isk refers to any repetition of the field (including zero), and 'OTHER INFORMATION' usually means the 
province (or region) name, postal codes, or P. O. Box. For instance, 

PHYSICS DEPARTMENT, THE ROCKEFELLER UNIVERSITY, NEW YORK, NEW YORK 

THE INSTITUTE FOR PHYSICAL SCIENCES, THE UNIVERSITY OF TEXAS AT DALLAS, 
P. O.BOX 68 8, RICHARDSON, TEXAS 

PHYSICS DEPARTMENT, UNIVERSITY OF GUELPH, GUELPH, ONTARIO NIG 2W1, CANADA 



Figure. 8 shows the probability distribution of the number of comma separated fields for all affiliation strings. 



The mean value of such numbers is 4.33 and the standard deviation is 1.156. 86% of all affiliation strings 
have between 3 and 5 comma separated fields, while the percentage rises to 97% for those with less than 
8 such fields (mean±3cj). Therefore, we first assume that an affiliation string with no more than 7 comma 
separated fields represents a single affiliation, and the remaining ones may consist of multiple affiliations. 



1.1 Parsing country names 

We first extract country and U.S. state names from single affiliation strings. To find country names, we 
create a dataset of country names except U.S. from ISO 3166 country codes [?], and the name of U.S. states 
from Wikipedia [?]. For some historical country names in the 20th century (e.g., the Soviet Union, Yu- 
goslavia, East Germany), we manually add them in the dataset. Besides, for some countries, we take into 
consideration the name variations, like full official names and the name in its official language, and possible 
abbreviations, e.g., U.S.S.R for the Soviet Union, People's Republic of China for China, Deutschland for 
Germany, etc. 



Based on the above assumptions and observations, for an affiliation string with no more than 7 comma 
separated fields, we first search the field representing a country name, the process of which is called 'field 
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the number of comma separated fields 



Figure 8: The probability distribution of the number of comma separated fields in an affiliation string. The 
mean value of such the number is 4.33 and the standard deviation is 1.156. The grey area in the plot 
represents the band with the width of 3 standard deviations, which implies that the most of affiliation strings 
consist of no more than 7 comma separated fields. 

match' . For each field in an affiliation string, we eliminate the words with numbers 0-9, which may rep- 
resent a postal code, and then try to match the field with any of the country name in our country name dataset. 

If there is no field match for an affiliation string, it is possible that either the author did not write a country 
name specifically but some other fields, like the institution name, include a country name (e.g., RANDAL 
MORGAN LABORATORY OF PHYSICS, UNIVERSITY OF PENNSYLVANIA), or the country name 
is mixed with other information in afield, like a city name or a non-numeric postal code (e.g., MAX-PLANCK- INST I TUT 
FUR MOLEKULARE PHYSIOLOGIE POSTFACH 500247 D-44202 DORTMUND GERMANY). More- 
over, for the affiliation strings with 'field match' results, other fields in that string may also contain country 
names for multiple affiliation cases (e.g., ARGONNE NATIONAL LABORATORY, ARGONNE, ILLINOIS 
60439 AND OHIO STATE UNIVERSITY, COLUMBUS, OHIO). For the kind of affiliation strings 
without field match results, we try to match the country name word by word in all fields in that affiliation 
strings, and for the ones with some field matched, we match the country names word by word in other 
fields. We call this process 'string match'. If there is a single match from the above two steps, we assign 
the matched country name to this affiliation string, and classify it into affiliation strings with unique country 
name. If there are multiple country names matched, we set these affiliation strings aside for later processing. 

The above two procedures of fieldmatch' and 'string match' give unique country name to 95.11% affiliation 
strings (899, 575 out of 945, 767), but 1.83% (17, 278 out of 945, 767) affiliation strings have no country 
name detected. The remaining 3% affiliation strings either contain more than one country name or have 
more than 8 fields which may represent multiple affiliations. 

The next step is to focus on 'splitting the multiple affiliations' into single records. The case of an af- 
filiation string with multiple country names varies. For instance, it may represent one affiliation but in- 
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elude the country names with overlapped words (e.g., Mexico vs. New Mexico for string match pro- 
cedure, like THE UNIVERSITY OF NEW MEXICO, ALBUQUERQUE NEW MEXICO and Washington 
vs. Washington, D.C. for field match procedure, like THE GEORGE WASHINGTON UNIVERSITY, 
WASHINGTON, D.C); or some country names may represent a city, a region or a street, (e.g., ST. 
JOHN'S UNIVERSITY, JAMAICA, NEW YORK); or the union states for some historical countries 
(e.g. FACULTY OF CIVIL ENGINEERING, UNIVERSITY OF BELGRADE, BULEVAR REVOLUCIJE 
73, 11000 BEOGRAD, SRBIJA, YUGOSLAVIA). We go through this scenario first, and try to filter 
out affiliation strings of unique affiliation. We assume that two country names cannot appear in the neighbor 
fields or in the neighbor words. Thus, if we found two country names in neighboring fields, we consider 
the latter one as the real country name. But if two country names are in the same comma separated field, 
we determine the country name(s) based on their position. We assign an index to each of the words in 
that field according to the order of the words. If the number of words between the first indices of two 
country names is less than the number of the words of the longer country name, the country name with 
the larger length is the country name. For instance, in the above example THE UNIVERSITY OF NEW 
MEXICO, ALBUQUERQUE NEW MEXICO, we find two country names in the second field: NEW MEXICO 
and MEXICO with the word indices 2 and 3 respectively. The number of words between two indices is 1, 
which is smaller than the length of NEW MEXICO, so we determine NEW MEXICO is the country name for 
this affiliation. 

After performing the multiple name checking described above, we consider the remaining affiliation strings 
consisting of multiple affiliations. We observe that the affiliation strings in this scenario usually contain 
elements implying multiplicity, like AND and semicolons. For example: 

THE RICE INSTITUTE, HOUSTON, TEXAS AND THE COLLEGE OF THE PACIFIC, STOCKTON, 
CALIFORNIA 

INSTITUTE FOR ADVANCED STUDY, PRINCETON, NEW JERSEY 08540 AND PHYSICS 
DEPARTMENT, CALIFORNIA INSTITUTE OF TECHNOLOGY, PASADENA, CALIFORNIA 

ISTITUTO DI FISICA DELL' UNIVERSITA, ROMA, ITALY; AND ISTITUTO NAZIONALE 
DI FISICA NUCLEARE, SEZIONE DI ROMA, ITALY 

If there are semicolons in the affiliation strings, we split the affiliation strings by the position of the semi- 
colon. However, if there is no semicolon, while there is an AND, we have to exclude the case like 'DEPARTMENT 
OF PHYSICS AND ASTRONOMY'. To do so, we observe that if an AND joins two affiliations, the country 
name usually should appear closely before the AND, so we split the string into two part by an AND if the 
last word position of the country name before AND is at most one word far from the AND (We allow one 
word between the country name and AND because of possible non-numeric postal codes.), and the AND does 
not join any two of the descriptive words of research subjects, which usually appear in the information of 
institute and sub-institute. We built a list of descriptive words by calculating the frequency of the word 
appearance in the first field of all affiliation strings. The top 20 frequently appeared descriptive words are 
listed in lTable. 41 

For the affiliation strings with more than 7 fields, e.g., 

CENTER FOR THEORETICAL PHYSICS, DEPARTMENT OF PHYSICS AND ASTRONOMY, 
UNIVERSITY OF TEXAS AT AUSTIN, TEXAS 79712; CENTER FOR ADVANCED STUDIES, 
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Table 4: The top 20 descriptive words of research subjects. 



word 


frequency 


word 


frequency 


PHYSICS 


314266 


RESEARCH 


55692 


SCIENCE 


37345 


THEORETICAL 


32976 


ASTRONOMY 


32247 


ENGINEERING 


28179 


MATERIALS 


27572 


PHYSIK 


24083 


CHEMISTRY 


23821 


FISICA 


23649 


FISICA 


22711 


PHYSIQUE 


21928 


NUCLEAR 


21860 


TECHNOLOGY 


18769 


SCIENCES 


16999 


APPLIED 


16184 


THEORETISCHE 


12994 


MATHEMATICS 


10978 


SOLID 


10351 


PHYSICAL 


9194 



DEPARTMENT OF PHYSICS AND ASTRONOMY, UNIVERSITY OF NEW MEXICO, ALBUQUERQUE, 
NEW MEXICO 97131; AND MAX-PLANCK- INST I TUT FUR QUANTENOPTIK, D-8046 GARCHING 
BEI MUNCHEN, WEST GERMANY 

we first split it by semicolons but not by AND. The split substrings will be processed step by step from field 
match to string match and possibly splitting multiple affiliations, in the same way as an affiliation string with 
no more than 7 fields is processed. 

It is worth to note that even after splitting process, some of the affiliation strings still contain more than one 
country name, like 

LOS ALAMOS NATIONAL LABORATORY, UNIVERSITY OF CALIFORNIA, LOS ALAMOS, 
NEW MEXICO 

for which the above steps give both California and New Mexico as its country names, or 

INSTITUTE FOR QUANTUM COMPUTING, UNIVERSITY OF WATERLOO, N2L 3G1, WATERLOO, 
ON, CANADA, ST. JEROME'S UNIVERSITY, N2L 3G3, WATERLOO, ON, CANADA, AND 
PERIMETER INSTITUTE FOR THEORETICAL PHYSICS, N2L 2Y5, WATERLOO, ON, CANADA 

of which the first substring after splitting by AND (INSTITUTE FOR QUANTUM COMPUTING, UNIVERSITY 
OF WATERLOO, N2L 3G1, WATERLOO, ON, CANADA, ST. JEROME'S UNIVERSITY, N2L 
3G3, WATERLOO, ON, CANADA) still contains another affiliation and there is no more semicolon and 



AND to indicate the position to split. Figure. 8 shows that on average affiliation strings representing a single 
affiliation consist of four fields, therefore we split the affiliation (sub)strings of multiple country names but 
without any semicolon and AND at the position of the country names if the number of fields between two 
country names is not smaller than 4. Thus the final country names for the affiliation strings of the above two 
examples are 'New Mexico' and three 'Canada's respectively. 

To double check the results obtained from the above procedures, we use Google geocoders from geopy tool- 
box [?] to get the country names searched by Google map, and call this step Google geocoders checking. 
Unfortunately, Google geocoders usually cannot code the affiliation strings with department information or 
even institution information. To avoid these exceptions, for the affiliation string with more than three fields, 
we send the last three fields as an address string to geocoders, and for others we input the whole string to 
geocoders. Google geocoders return a comma separated address string for each input. If the returned string 
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is not empty, we match the country names, 2-letter or 3-letter abbreviations in our country name dataset 
with the returned result. Once the matched result represent the same country as we extracted, we say the 
country name we parsed for this affiliation string is validated. It should be noted that we do not use Google 
geocoders (or other geocoders like Yahoo ! or Bing) directly to search country names because to our best 
knowledge there is no evidence to guarantee the accuracy of the results from these APIs.Thus we perform 
this step of checking to get better accuracy. 



Figure. 9 summarizes the above steps to extract country names from affiliation strings in a flow chart. As 
the result, the 3% of affiliation strings with multiple country names and more than 7 fields are finally split 
into 46, 353 new records. In the end, we obtain 963, 206 records of single affiliation, of which 97.68% 



(940, 896) have a country name validated with Google geocoders. Figure. 10 indicates that after 1940, we 
parsed validated country names for more than 95% of papers in each year. We use these affiliation strings 
with validated country names to build citation networks at the country level after 1940, and as the inputs to 
extract city names. 
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Figure 9: The flow chart of the procedure to extract country name(s) from affiliation strings. 
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Figure 10: The percentage of papers (DOIs) with validated country names per year. The plot shows that 
after 1940 we obtain more than 95% of papers with verified country names (blue bars). 

1.2 Parsing city names 

We use the database of GeoNames to parse the name of cities in the affiliation strings with identified country 
names. GeoNames database includes geographical data such as names of villages, cities, and other types 
of places in various languages, elevation, population and others from various sources PT71 . The variations 
of languages for geographic names allow us to identify city names written in languages other than English. 
Each record of places in the database also includes its country name and possibly the first level of admin- 
istrative division (e.g., the states in the United States). We first filter records that represent cities (by the 
feature codes attribute in GeoNames data), and arrange cities by the names of countries and US states. For 
countries like the Soviet Union and Yugoslavia, we combine the cities of their former union countries; and 
for East Germany we simply use the cities in Germany. 

The final results from the above section is a set of affiliation strings, each of which owns a unique country 
name, so we argue, that to our best effort, each affiliation string now only represents an institution and has 
one city name if any. Since each affiliation string now has a validated country name, we only use the city 
list of that country to avoid the same city name in different countries. 

After cleaning the data, the first step to parse city names is 'field match' , as we performed to find coun- 
try names. For each field, we delete words with numbers and try to match it with city names in filtered 
city dataset for that country. If there are matched city names, we list both the name and coordinates as out- 
puts, otherwise we perform 'string matcK on the affiliation strings trying to match city names word by word. 

As we did to validate country names, we use Google geocoders from geopy toolbox to check the correctness 
of the city names we extract from affiliation strings. The procedure is similar to that for the country names: 
the affiliation strings excluding the department level information are given as input to Google geocoders, 
and the non-empty Google searched results are saved for the next step of validation.The coordinates and 
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city names given by Google geocoders for an affiliation string are based on the name of the institutions, 
and may be different from the name extracted and the coordinates of the city given in GeoName database. 
To determine if the extracted city name is correct, we simply calculate the geographic distance between the 
coordinates given by GeoNames database and the ones given by Google geocoders, and if the distance is 
less than 50km, we say the extracted result is matched with Google searched result. For the affiliation strings 
with multiple city names, we choose the one which has the shortest Vincenty's distance from the Google 
geocoded result. 



In total, we have 92.6% (871, 345 out of 940, 896) affiliation strings with validated city names. Figure. 1 la 



shows the the percentage of papers (DOIs) with validated city names per year, from which one can observe 
that we obtain validated city names for more than 90% of papers after 1940, and for this reason we use 
data after that year to perform analysis at the city level in this paper. Figure. 1 lb displays the percentage of 
papers with validated city names to the total number of papers for each country after 1940. The abscissa is 
60 country names ordered by the total number of papers for each country after 1940. These top 60 countries 
contribute 95% of the papers published in Physical Review journals after 1940, as shown by the cumulative 
distribution of the total number of papers for all countries (the red dot curve). From Figure. 1 lb we claim that 
for the most of major countries contributing to publications in Physical Review journals we have unbiased 
results of parsing city names. 




Figure 1 1 : The percentage of papers (DOIs) with validated city names per year (a) and the percentage of 
papers (DOIs) with validated city names per country [(b)] [(a)] clearly shows that after 1940 we obtain more 
than 90% of papers with verified city names for each year (blue bars). In (b) the x-axis is top 60 countries 
ranked by the total number of papers after 1940 in each country. The red dot curve is the cumulative 
distribution function of the number of papers over countries after 1940. For the major contributing countries 
in terms of paper production, we have obtained more than 80% of papers with validated city names. 



So far we have obtained geographic coordinates and city names for the affiliation strings from Google 
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geocoders and GeoName database. However, different city names may represent the same city, geographi- 
cally close cities or different administrative levels. For instance, 

DEPARTMENT OF PHYSICS, BOSTON COLLEGE, BOSTON, MASSACHUSETTS 024 67, USA 

DEPARTMENT OF PHYSICS, BOSTON COLLEGE, CHESTNUT HILL, MASSACHUSETTS 

Because Chestnut Hill is not a city in Massachusetts in GeoNames database, the city name extracted from 
these two affiliation strings for Boston College is Boston, while Google geocoders gives the city name of 
Newton. In this case, one cannot automatically determine which city this affiliation should be in. One pos- 
sible way to solve such the problem is to project the coordinates into polygons of 'cities' in shapefiles for 
geographic information systems software. However, the existent shapefiles have different granularities for 
different countries. It may be unfair to compare the scientific products in different level of administrative 
units over different countries. 

Therefore, we cluster cities according to their geographic coordinates into 'urban areas' or 'academic cities' 
in each country. For each country, we perform hierarchical/agglomerative clustering with the geographic 
distance matrix, of which the distances are calculated with Vincenty's formula. With the dendrogram pro- 
duced from the clustering process, we cut off the branches from the maximum height value to lower ones 
until the distance between any point in a cluster and the centroid of the cluster is less than 25km (the maxi- 
mum distance within the cluster is 50km) for all clusters. We call such clusters 'academic cities'. The final 
coordinates of an academic city is the centroid of all coordinates inside that cluster, and the academic city 
is named with the city name which has the most papers in that cluster. We notice that due to the differences 
between geographic areas in different countries, some cities are merged into one academic city and some 
other cities are split into two. For instance, Boston, Cambridge, Newton in Massachusetts are now clustered 
into one urban area with the name Boston; and Dubna in Moscow Oblast now becomes a separate academic 
city. Finally, we have a list of academic cities for each paper (DOI), and all the analysis we made at the city 
level in this paper refer to the unban areas or academic cities. 

2 Building the citation networks 

A citation network consists of a set of nodes (cities) and directed links representing citations that one paper 
written in one city is cited by a paper written in another one according to the references of the latter. For 
example, if a paper is written in node i cites one paper written in node j there is an edge from % to j, i.e., 
j receives a citation from i and i sends a citation to j. As shown in Figure (1) in the main text, a directed 
link from Ann Arbor to Rome and another link to Madrid are built since paper A, which is from Ann Arbor, 
Michigan, cites the paper B from Rome, Italy and Madrid, Spain. Because the paper A was also contributed 
by authors from another two cities: Los Alamos in New Mexico and New York City in New York, from each 
of these two cities, there is also a link to Rome and another to Madrid. 

The weight of a link is defined as following. In a given time window, the total number of citations for the 
papers written in j received from papers written in a, is the weight of the link (i — > j), and the total number 
of citations for those paper written in j sent to the papers written in k is the weight of the link (j — > k). 
For instance, in time window t, there is one paper written in node j, which cited two papers written in node 
k and was cited by three papers written in node i, then there are Wij = 3, Wj^ = 2, and we add up such 
weight for all papers written in that node j and obtain the weights for links. For the paper written in multiple 
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cities, say the weight will be counted equally, i.e., w%j x 

use in this paper is 1 year. 



w 



'32,k 



. The time window we 



3 Basic properties of data and citation networks 

We observe a significant growth of the published articles and the citations in recent 50 years, as shown in 



Figure. 12 Meanwhile, the percentage of papers contributed by authors in the United States has decreased 
from nearly 90% in early 1960's to current 36% (Figure. 13). Correspondingly, the number of cities con- 



tributing to publications in APS journals, as well as their internal interactions, has increased dramatically, 



as illustrated in Figure. 14 and Figure. 15 



In Table. 5 we report basic statistic properties for the city-to-city citation networks in selected years. Fig- 



ure. 16a reports the cumulative distribution functions for in- and out-degree of the city-to-city citation net- 
works in different years. The distributions are with behaviors close to power-law with the exponential cutoff. 
As the year increases, the range of values of k m and k ou t extends. We define the in/out-strength of node i 



percentage of papers from USA 



as the total number of citations it sends/receives at that year. Figure. 16b displays the cumulative distribu- 
tion function for in- and out-strength of the city-to-city citation networks in different years. The pattern of 
strength distributions is quite similar to the degree distributions. 
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Figure 12: The number of papers (top) and the Figure 13: The percentage of papers contributed by 
number of citations (bottom) as the function of time authors from USA as the function of time (1960- 
(1960-2009). 2009). 



Table 5: Summary of basic statistic features for city-to-city citation networks in different years. 



year 


V 


E 


fcta 


Km 




Sout 


Wij 


mean 


std. 


min 


max 


mean 


std. 


min 


max 


mean 


std. 


min 


max 


mean 


std. 


min 


max 


mean 


std. min 


max 


1960 


222 


2517 


11.34 


18.13 





90 


11.34 


15.20 





84 


41.24 


111.16 





765 


41.24 


95.99 





940 


3.64 


11.57 1 


336 


1970 


438 


9461 


21.60 


38.97 





236 


21.60 


26.72 





153 


87.53 


288.39 





2893 


87.53 


198.54 





1758 


4.05 


13.98 1 


564 


1980 


635 


17028 


26.82 


47.96 





332 


26.82 


34.84 





206 


94.08 


311.71 





4182 


94.08 


213.94 





2164 


3.51 


11.02 1 


557 


1990 


897 


43324 


48.30 


80.31 





539 


48.30 


58.37 





329 


207.59 


671.95 





9125 


207.59 


459.34 





4372 


4.30 


13.00 1 


830 


2000 


1327 


109438 


82.47 


126.79 





754 


82.47 


102.83 





556 


801.76 


2640.94 





34768 


801.76 


2167.73 





20862 


9.72 


29.71 1 


1568 


2009 


1704 


204747 


120.16 


178.22 





968 


120.16 


151.16 





822 


3033.86 


9230.21 





104149 


3033.86 


8651.34 





76044 


25.25 


75.12 1 


3004 
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Figure 14: The number of nodes (cities) for city-to- 
city citation networks as the function of time (1960- 
2009). 



Figure 15: The number of links for city-to-city cita- 
tion networks as the function of time (1960-2009). 
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(a) The cumulative distribution function of the de- 
grees for citation networks at the city level. 



(b) The cumulative distribution function of the 
strength for citation networks at the city level. 



Figure 16: The cumulative distribution function of degree and strength for city-to-city citation networks in 
year 1960, 1970, 1980, 1990, 2000 and 2009. 
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4 Top producers/consumers and results from knowledge diffusion proxy 



In |Figure. 17| we show the cumulative disuibution of the absolute citation unbalance \As\ for producers and 
consumers at the city level. Similar to the cumulative distributions of strength, the distributions are charac- 
terized with heavy tails, and the distributions have become broader as the time increases. 



We list top 20 producers and consumers at the city level from 1985 to 2009 (Table. 6), from 1960 to 1980 



(Table. 7 1. It is worth noting that the definition of unbalance As is from the difference between the number 
of citations sent and received, which cannot distinguish between cities with a large amount of production 
and consumption and those with less production and consumption. 



10 r^^p: 



^ 10 



A 10"' 



producer 




10 10° 
|AS| 



10 J 



Figure 17: The cumulative distribution function of the citation unbalance for producers and consumers at 
the city level in year 1960, 1970, 1980, 1990, 2000 and 2009. 
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Table 6: Top 20 producers and consumers at the city level (1985-2009) 



(a) Top 20 producer cities 



rank 


1985 


1990 


1995 


2000 


2005 


2009 


1 


Piscataway 


Piscataway 


Piscataway 


Boston 


Boston 


Boston 


2 


Boston 


Boston 


Boston 


Piscataway 


New York City 


Berkeley 


3 


Berkeley 


Palo Alto 


Yorktown Heights 


Los Angeles 


Los Angeles 


New Haven 


4 


Princeton 


Yorktown Heights 


Berkeley 


Berkeley 


Tallahassee 


Suwon 


5 


Yorktown Heights 


Berkeley 


Los Angeles 


Chicago 


Palo Alto 


Princeton 


6 


Ithaca 


Princeton 


Urbana 


New York City 


Berkeley 


Piscataway 


7 


New York City 


Ithaca 


New York City 


Lemont 


Piscataway 


Higashihiroshima 


8 


DC 


New York City 


Chicago 


Urbana 


Urbana 


Prairie View 


9 


Palo Alto 


San Diego 


Ithaca 


Philadelphia 


Pavia 


Los Angeles 


10 


Lemont 


Philadelphia 


Lemont 


Princeton 


West Lafayette 


Lubbock 


11 


Los Angeles 


Chicago 


Princeton 


West Lafayette 


Ithaca 


Palo Alto 


12 


Chicago 


Santa Barbara 


Palo Alto 


Batavia 


Rochester 


Batavia 


13 


San Diego 


Pittsburgh 


Santa Barbara 


Rochester 


Honolulu 


New York City 


14 


Seattle 


Lemont 


Philadelphia 


Yorktown Heights 


Batavia 


Nashville 


15 


Rehovot 


Los Angeles 


Minneapolis 


Palo Alto 


Yorktown Heights 


Bristol 


16 


New Haven 


New Haven 


San Diego 


Dallas 


Irvine 


Rochester 


17 


Urbana 


Orsay 


Batavia 


Tsukuba 


Lemont 


Urbana 


18 


Pittsburgh 


Holmdel 


Zurich 


Waltham 


Minneapolis 


Daegu 


19 


Villigen 


Stony Brook 


Waltham 


Madison 


Philadelphia 


Tallahassee 


20 


Waltham 


Batavia 


Madison 


East Lansing 


Boulder 


Pittsburgh 


(b) Top 20 consumer cities 


rank 


1985 


1990 


1995 


2000 


2005 


2009 


1 


Stuttgart 


Tokyo 


Moscow 


Beijing 


Beijing 


Athens 


2 


Toronto 


Beijing 


Beijing 


Seoul 


Barcelona 


Gwangju 


3 


Gaithersburg 


Tsukuba 


Seoul 


Lancaster 


Coventry 


Bratislava 


4 


Annandale 


Tallahassee 


East Lansing Grenoble 


Valencia 


Vancouver 


5 


Bloomington 


Vancouver 


Lubbock 


Dubna 


Perugia 


Madrid 


6 


Minneapolis 


Grenoble 


Montreal 


Manhattan 


Moscow 


Berlin 


7 


Warsaw 


Seoul 


Tallahassee 


Quito 


Heidelberg 


Trieste 


8 


Berlin 


Kolkata 


Davis 


Suwon 


London 


Mainz 


9 


Vancouver 


Charlottesville Dallas 


Stillwater 


Dubna 


Waco 


10 


Ames 


Durham 


Taipei 


Santander 


Riverside 


Paris 


11 


West Lafayette 


Buffalo 


Berlin 


Lawrence 


Amsterdam 


Valencia 


12 


Charlottesville 


Warsaw 


Tokyo 


Krakow 


Hefei 


Coventry 


13 


Seoul 


Tempe 


Toyonaka 


Marseille 


Dresden 


Moscow 


14 


Montreal 


Berlin 


Delhi 


Tokyo 


Bellaterra 


Bellaterra 


15 


Trieste 


Madrid 


Trieste 


Karlsruhe 


Shanghai 


Lanzhou 


16 


Kyoto 


Sao Paulo 


St Petersburg Daegu 


Evanston 


Shanghai 


17 


Tokyo 


Taipei 


Dresden 


Udine 


Taipei 


Sao Paulo 


18 


Varanasi 


Brussels 


Bologna 


Oxford 


Glasgow 


Kolkata 


19 


Rio De Janeiro 


Mainz 


Munich 


Moscow 


Liverpool 


Clermont 


20 


Ridgefleld 


Davis 


Cambridge 


Ruston 


Bari 


Hefei 
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Table 7: Top 20 producers and consumers at the city level (1960-1980) 



(a) Top 20 producer cities 



rank 


1960 


1965 


1970 


1975 


1980 


1 


Boston 


Princeton 


Berkeley 


Boston 


Boston 


2 


Princeton 


Berkeley 


Boston 


Berkeley 


Princeton 


3 


Urbana 


Boston 


Princeton 


Palo Alto 


Piscataway 


4 


Oak Ridge 


Piscataway 


Chicago 


Princeton 


Berkeley 


5 


Piscataway 


New York City 


Piscataway 


Piscataway 


Palo Alto 


6 


New York City 


Los Angeles 


Palo Alto 


Ithaca 


Ithaca 


7 


Los Angeles 


Los Alamos 


Albany 


Chicago 


New York City 


8 


Los Alamos 


Albany 


San Diego 


Oak Ridge 


Chicago 


9 


Chicago 


Ann Arbor 


Madison 


San Diego 


San Diego 


10 


Ithaca 


Pittsburgh 


New York City 


New Haven 


Los Angeles 


11 


Rochester 


Meyrin 


Pittsburgh 


Los Angeles 


Stony Brook 


12 


DC 


Waltham 


Waltham 


Urbana 


New Haven 


13 


Madison 


Urbana 


Meyrin 


Pittsburgh 


Philadelphia 


14 


Bloomington 


Cambridge 


Ithaca 


Batavia 


Albany 


15 


Utrecht 


Bloomington 


Cambridge 


Providence 


Urbana 


16 


Durham 


Lemont 


Los Angeles 


Albany 


Albuquerque 


17 


London 


Ithaca 


Los Alamos 


Durham 


Waltham 


18 


Saskatoon 


DC 


New Haven 


Rochester 


Batavia 


19 


Sydney 


Chicago 


Livermore 


Livermore 


College Park 


20 


St Louis 


Zurich 


London 


DC 


Pittsburgh 


(b) Top 20 consumer cities 


rank 


1960 


1965 


1970 


1975 


1980 


1 


Berkeley 


West Lafayette 


Evanston 


Stony Brook 


Austin 


2 


Palo Alto 


Palo Alto 


West Lafayette 


Grenoble 


Boulder 


3 


New Haven 


Orsay 


Austin 


Columbus 


Tokyo 


4 


Pittsburgh 


College Park 


Trieste 


Stuttgart 


Haifa 


5 


Waltham 


Albuquerque 


Columbus 


Toronto 


Toronto 


6 


San Diego 


Livermore 


Delhi 


Austin 


Bhubaneswar 


7 


Lemont 


Delhi 


Amherst 


East Lansing 


Rehovot 


8 


Livermore 


Minneapolis 


Rochester 


Amherst 


Ottawa 


9 


West Lafayette 


Trieste 


Milwaukee 


Mumbai 


Paris 


10 


Poughkeepsie 


Providence 


Baton Rouge 


Denton 


Santa Barbara 


11 


Evanston 


Ames 


Buffalo 


Mexico City 


Houston 


12 


Tallahassee 


Rochester 


Seattle 


Munich 


Golden 


13 


Columbus 


Evanston 


Salt Lake City 


Paris 


Stuttgart 


14 


Canberra 


San Diego 


Haifa 


Honolulu 


Kolkata 


15 


Yorktown Heights 


Syracuse 


Hoboken 


Montreal 


Toyonaka 


16 


Arlington 


Rehovot 


Lincoln 


Orsay 


Kyoto 


17 


Rome 


Hoboken 


Gainesville 


Roskilde 


Grenoble 


18 


Meyrin 


Oxford 


Tucson 


Madison 


Jiilich 


19 


Ames 


El Segundo 


Bloomington 


West Lafayette 


Vancouver 


20 


Irvine 


Milan 


East Lansing 


Rehovot 


Kingston 
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5 Top ranked cities from scientific production ranking algorithm 



We show the cumulative distribution of scientific production ranking scores for cities in selected years in 



Figure. 18 We notice that ranking scores are also characterized with heavy tail distributions. In addition, 



we also observe that both the maximum and minimum ranking scores has decreased with time, and the 
tail of the distribution becomes steeper in recent decades, which indicates the differences of ranking scores 
between top ranked cities have gradually shrunk. 




Figure 18: The cumulative distribution function of scientific production ranking scores for cities in year 
1960, 1970, 1980, 1990, 2000 and 2009. 



In Table. 8 and Table. 9 we report top 50 cities ranked from scientific production ranking algorithm from 
1985 to 2009 and from 1960 to 1980 respectively. 



30 



Table 8: Top 50 cities from scientific production ranking algorithm (1985-2009) 



rank 


1985 


1990 


1995 


2000 


2005 


2009 


1 


Pispatawav 

x 1l?i^cilci vv ci y 


Pisratawav 

x ij\^ ci Lti vv ci y 


Rostnn 

JJ \JJ LVJll 


Roston 

IJWJ L\_J11 


Roston 

JJ \JJ LVJll 


Roston 


2 


Ronton 


Roston 


Pi spatawa v 

x i j ^ tt l ti vv ciy 


Berkeley 


Los Angeles 


Berkeley 


3 


R f»l v 
jjciivcicy 


jjciivcicy 


jjci ivcicy 


x laCtlLciWti y 


jjci R.cicy 


T i~ic A nrrplpc 
LjUs /VLlgClCf} 


4 


Palo Alto 

X til VJ illLU 


Palo Alto 

X CLLKJ iVl Lt_/ 


T os Ancplps 

Xj\_/l> OllgL'lL'J 


T os Antrplps 


Or say 


Tokyo 


5 


New York City 


Yorktown Heights 


New York City 


New York City 


Tokyo 


Orsay 


6 


T ns Antrplps 


T ns Antrplp*; 

Xj*J j Yill ciCICj 


T Trha n a 

V. ' 1 Litllltl 


Phipa (Tpi 

V, 1 11 tig 


Pri npptnn 

X llllCCLWll 


f^hipa qc\ 


7 


Tth apa 


Maui YotV Oitv 
1NCVV lUlft. l^lLy 


Phipa cn 

V^llll^tlgU 


T Trhan a 

U 1 Utllld 


Pi spatawa v 
x lacdLtivvtiy 




8 


T ns Alamns 

1 _. 1 / .1 ill til 1 1 vj J 


T ns Alanine 

L/Wo jYltll 1 IVJ o 


T pmnnt 


Rnphpstpr 


Palo Alto 

X tllLj Al LVJ 


Pri npptnn 

X 1111CCLVJX1 


C) 


x illlCCLUil 


Pi*i n^ptnii 
x IlllCCLUil 


Pain Altn 


R 'itQ\n Q 
DtlLtlVla 


Npii/ Ym*k Pit\/ 
1NCW XUllv. V--lLy 




10 


Yorlrtnwn 1-TpicThts 
J.U11VLUVV11 rxci^iiLa 


T Trhan a 
u i Udlld 


R atavi a 

JJtlLtl Vltl 


W/pst T afavpttp 
vvcsl 1—iaia.y cllc 


Phi 1 arlplnhi a 
x iiiitiucijjiiiti 


Pi spatawa v 
x lacdidw dy 


11 


Lemont 


Chicago 


Philadelphia 


Lemont 


Urbana 


London 


12 


T Trh a n a 

U 1 Udlld 


Phi 1 aHplnhi a 

X IXlldUClJJllld 


IVltLU-lSvJll 




Santa Rarhara 

OdllLtl JJtllUtlltl 


T Trh ana 

U 1 Udlld 


13 


f^hipa cm 

\— 'lllCdtlVJ 


Tthapa 

1 11 1 ll V- Ll 


R nphpstpr 

1\V'L-1IL^lI L V, 1 


Past T ansinc 


Rome 


T pmnnt 

XjClllWll L 


14 


Phi 1 aHplr\hi a 

X lllldU-CljJilld 


T pmnnt 

X-iClllUlll 


T afavpttp 

VVCSL l_jtlltiy CLLC 


Ann Arhnr 

/T.1111 ill UVJl 


Pnl i i mhi i s 
\ji mil u ll ;> 


Phi 1 arlplnhi a 
x iiiiducijjiiid 


15 








Tokyo 


Pnllpcrp Park 

V^UllCgC lalK 


Oxford 


16 


DC 


Sunt?) Rarhara 

OtlllLtl JJ til L> til tl 


Pri npptnn 

X 1111CCLU11 


Pnllpcp Station 

\_.(J11C^C OLtlLlUll 


Mpu; T-Tavpn 
iNcw jntivcii 


Santa Rarhara 

OdllLd JJdlUtlld 


17 


Cnllpcp Park 

V—'VJll^ilk-' X til IV 


College Park 


Los Alamos 


Tsukiiba 

X l3 LtlVLtl_/tl 


Lemont 


New Haven 


18 


Oak Ridpe 


Oak Riripe 

v / li r\ xvivj-tic 


R.ome 


PhilaHplnhia 

X lllltlLJLvl L/llltl 


A/IaHisnn 

X V X dUl a vj 1 1 


R nphpstpr 

XVLJCllC^ LCI 


19 


Santa Rarhara 

>J till LCI U til L7tll £1 


T ivftmnrp 

V V_ 1 lllvjl 


Tsi lknha 

x a Lllv Ll L> Cl 


Palo Alto 

X tllVJ JY1LVJ 


Paris 


lVTarlisnn 

IVXdLJiaCll 


20 


R nphpstpr 


Ratavia 

JJ llltl V 1 Ll 


Santa Rarhara 

vJ till IL1 U til Utll tl 


IVTaHi son 

ivxtiuiavjii 


San Tj'ipcrn 

iJtlll XJlCgU 


("'nli lmhi is 

VJL Lll 11 U Ll Z> 


21 


Rphnvnt 


Tokyo 


Yrvrktnwn l-Tpicrhts 

X VJIIVLVJ W 11 XXUlgllLa 


Pnllptrp Park 

Ullt^gV^ X til IV 


Phipacrp* 

V,llXCtlgVJ 


f^nllpcp Park 

CUllUgL' X til IV 


22 


.San Dipfo 

LjClll LyiV/LU 


Rnrhpstpr 


Collpfp Station 

V— -V_/ll^£-i^- *JLtlLlV_/ll 


Pi ttshnrfh 

X 1 LLiZ5 LyLLl &X1 


Tsnki iha 


Ratavia 

JJ tl Ltl V 1 Cl 


23 


Pittshl lrcrh 

X ILLSUUl till 


San Plipcrn 

O til 1 UlCgU 


Pittshi lrcrh 

X ILL&ULLXgll 




OvfnrH 

VjAXVJlvJ. 


AAnspnw 


24 


Npw Havpn 

11V/VV 1 HI V L^ll 


Coli irnhiis 

V> VJ1 Lllll L/ Ll o 


Tthara 

XLlltL^tl 


Princeton 


Oak RiHpp 

V / Ll IV Xvll_i£l^ 


Past T ansi n & 

XjCLo L XjCllliZ^lllg 


25 


Stonv Rrook 

<j lvjii y uiuuiv 


IVTaHi son 

1.VXCILJ.1 l5V_/11 


Collpfp Park 

V— *V_/ll^£-i^ X til IV 


T os Alamos 


Ta 1 1 a h a s spp 

X till til ILL J 


Palo Alto 


26 


OCdLLlC 


Pi ttshi lrcrh 

X ILLaULllgll 


Npw 1-Tavpn 
incw jntivcii 


T\Tpw l-Tavpn 
iicw rxtivcii 


R nphpstpr 

IvULl ICS LCI 


Pi ttshi irerh 

X ILLSUUlgll 


97 


v^UlUlllUUa 


DP 


Ann A fhrvi* 
/\1111 r\L UU1 


i uy uiitijvtt 


ceijing 


Con I 1 1 rur t~\ 
Odll Jj'lC^U 


28 


Rni i 1 Hpr 

UUUlUtl 


R pfiovnt 

XVC11U V UL 




P)i lrh a m 

X-V Llllltllll 


Pittshi irerh 

X lllaULLlgll 


Ann Arhnr 

1T.1111 111 UUl 
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Stnttcrart 

O LLlLLgdl L 


W/a 1th am 

Well Llltllll 


Pnl i liTirti i s 
i uiiiu u. a 




Tsi ilt i iha 

1 SUJVLLUd 


30 


Livermore 


Paris 


Past T ansinp 


.Stonv Rrook 

lv_/ii y jjx v_j\_/iv 


Wpst T afavpttp 

fTk/jL Xjtlltiy^LL^ 


Seoul 


31 


IVXdUlaLJll 


lVTi n n p a nn 1 i t 
iviiiiiic-tijjvjiis 


Oak RiHcp 

WtllS. XvlLl^C 


Santa Rarhara 

OlllHtl XJtlllJtlltl 


R atavi a 

Oct Id V id 




32 


A nsti n 
rvuj liii 


RnnlHpr 

JJ V 1 Ll 1\JC1 


Tokyo 


A lhnniiprniip 

rU U LIU LlV^l 11 LLC 


Pisa 


^A/pst T afavpttp 

TTC3L XjdldyCLLC 


33 




Maui T-TilVPn 


Stnnv Rrnnk 
o L*_Jiiy xjivjijiv 


Ralti mnrp 

JJ CllLllllVJl c 


Rni i lrlpr 
xj yj Liiuti 


Padua 


34 


Mich 


Wpot T afavpttp 

VVCflL J_,dldyCLLC 


San T^ipori 
Ottii uicgvj 


Tnrnn tn 

1 Lfl LJllLvJ 


Padua 


Y)i ihn a 

l^UUlld 


35 


Zurich 


Stnnv Rrook 


A/T i n n p a rt o 1 i s 

1VX11111^LII_/\_/11l3 


Pisa 


T onHon 


Pvanston 

Xj V C111l> L\_/ll 


36 


U Q I Ll V Ul 


Rlnnmi n trtnn 

JJ 1VJ LJllllllii LKJ 11 


R a 1 ti m nrp 

JJ til Lllllwl L^ 


Tall aha sspp 

x tiiitiiitiaacc 


A/Inntrpal 

1.VXW11L1 L/ul 


Ames 


37 


B loomington 


Seattle 


Padua 


Waltham 


Livermore 


New York City 


38 


Minneapolis 


Ann Arbor 


Toronto 


Ithaca 


Los Alamos 


Toronto 


39 


^A/p c ;t T afavpttp 


A 1 1 sfi n 

IV Ll^ Llll 


RnnlHpr 

JJ KJ LIIUVjI 


Tvf nspnw 


Seoul 


Oak RiHcp 

V/LLJV XN.lVJ.gC 


40 


Ann Arbor 


Zurich 


Albuquerque 


Montreal 


East Lansing 


Baltimore 


41 


East Lansing 


Vancouver 


Stuttgart 


Padua 


Moscow 


Beijing 


42 


Stuttgart 


Holmdel 


Livermore 


San Diego 


Nashville 


Karlsruhe 


43 


Evanston 


Rome 


DC 


Ames 


Ann Arbor 


Taipei 


44 


Grenoble 


Ames 


Paris 


Evanston 


College Station 


College Station 


45 


Syracuse 


Waltham 


Seattle 


Meyrin 


Vancouver 


Meyrin 


46 


Providence 


Albuquerque 


Rehovot 


Gainesville 


Irvine 


Los Alamos 


47 


Ames 


Toyonaka 


Durham 


Honolulu 


Taipei 


Toyonaka 


48 


Albany 


Albany 


Toyonaka 


Paris 


Dallas 


Liverpool 


49 


Waltham 


Jiilich 


Columbus 


Oak Ridge 


Meyrin 


Davis 


50 


Nashville 


Grenoble 


Dallas 


Bloomington 


Cincinnati 


Amsterdam 
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Table 9: Top 50 cities from scientific production ranking algorithm (1960-1980) 



rank 


1960 


1965 


1970 


1975 


1980 


1 


Berkeley 


B erkeley 


Roston 


Roston 


Roston 


2 


Roston 


Roston 

U J LVJ11 


Rprkplpv 

ULi i\ c iv^ y 


Pi spatawa v 

i lav^ciLci w ciy 


Pispatawav 

i i j w did wuy 


3 


Npw York Citv 


Prinppton 


Pi spatawav 

x ij\^ci\.ci w ci y 


Rprkplpv 

i j k^i iv^. i^ y 


Rprkplpv 

i _/ i ivw i ^ y 


4 


Princeton 


Pi spatawav 


Palo Alto 


Palo Alto 


Palo Alto 


5 


Chicago 


New York City 


Princeton 


New York City 


New York City 


6 
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6 Relation between research outputs and investment 



In this section, we report the relation between research outputs (i.e., citations) and investment on scientific 
research. As discussed earlier, we parsed city information based on country information for each affiliation, 
therefore we can aggregate the number of citations for cities to their countries, and measure the relation 



between research outputs and investment on research in that country. In Figure. 19 we plot the correlation 
between the average number of citations received by each country in 1996-2009 and the average amount of 
gross domestic product (GDP) spent on research and development (R& D) (in current US dollars) in that 
country in that period. We also plot the correlation between the average number of citations received by 
one country in the same period and the average research population in that country within the same time 
window. The number of citations received approximately linearly scales with both quantities. Such findings 
are consistent with the results reported in [7], which studied the database of the Institute for Scientific Infor- 
mation (ISI). This similarity indicates, although APS dataset is limited, it is representative of the scientific 
production for major countries. The data of GDP, the fraction of GDP spent on R& D, and the research 
population are from The World Bank data Il32l . 
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Figure 19: Relation between research outputs and the investment. (A) The average citations received by 
each country as a function of the average GDP on research and development (R& D) in million US dollars 
from 1996 to 2009. (B) The average citations received by each country as a function of the average research 
population in that country from 1996 to 2009. The solid black line shows the power-law fitting with the 
exponent 1.1 and 1.3 respectively. 
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